75 research outputs found

    Streaming Infrastructure and Natural Language Modeling with Application to Streaming Big Data

    Get PDF
    Streaming data are produced in great velocity and diverse variety. The vision of this research is to build an end-to-end system that handles the collection, curation and analysis of streaming data. The streaming data used in this thesis contain both numeric type data and text type data. First, in the field of data collection, we design and evaluate a data delivery framework that handles the real-time nature of streaming data. In this component, we use streaming data in automotive domain since it is suitable for testing and evaluating our data delivery system. Secondly, in the field of data curation, we use a language model to analyze two online automotive forums as an example for streaming text data curation. Last but not least, we present our approach for automated query expansion on Twitter data as an example of streaming social media data analysis. This thesis provides a holistic view of the end-to-end system we have designed, built and analyzed. To study the streaming data in automotive domain, a complex and massive amount of data is being collected from on-board sensors of operational connected vehicles (CVs), infrastructure data sources such as roadway sensors and traffic signals, mobile data sources such as cell phones, social media sources such as Twitter, and news and weather data services. Unfortunately, these data create a bottleneck at data centers for processing and retrievals of collected data, and require the deployment of additional message transfer infrastructure between data producers and consumers to support diverse CV applications. The first part of this dissertation, we present a strategy for creating an efficient and low-latency distributed message delivery system for CV systems using a distributed message delivery platform. This strategy enables large-scale ingestion, curation, and transformation of unstructured data (roadway traffic-related and roadway non-traffic-related data) into labeled and customized topics for a large number of subscribers or consumers, such as CVs, mobile devices, and data centers. We evaluate the performance of this strategy by developing a prototype infrastructure using Apache Kafka, an open source message delivery system, and compared its performance with the latency requirements of CV applications. We present experimental results of the message delivery infrastructure on two different distributed computing testbeds at Clemson University. Experiments were performed to measure the latency of the message delivery system for a variety of testing scenarios. These experiments reveal that measured latencies are less than the U.S. Department of Transportation recommended latency requirements for CV applications, which provides evidence that the system is capable for managing CV related data distribution tasks. Human-generated streaming data are large in volume and noisy in content. Direct acquisition of the full scope of human-generated data is often ineffective. In our research, we try to find an alternative resource to study such data. Common Crawl is a massive multi-petabyte dataset hosted by Amazon. It contains archived HTML web page data from 2008 to date. Common Crawl has been widely used for text mining purposes. Using data extracted from Common Crawl has several advantages over a direct crawl of web data, among which is removing the likelihood of a user\u27s home IP address becoming blacklisted for accessing a given web site too frequently. However, Common Crawl is a data sample, and so questions arise about the quality of Common Crawl as a representative sample of the original data. We perform systematic tests on the similarity of topics estimated from Common Crawl compared to topics estimated from the full data of online forums. Our target is online discussions from a user forum for car enthusiasts, but our research strategy can be applied to other domains and samples to evaluate the representativeness of topic models. We show that topic proportions estimated from Common Crawl are not significantly different than those estimated on the full data. We also show that topics are similar in terms of their word compositions, and not worse than topic similarity estimated under true random sampling, which we simulate through a series of experiments. Our research will be of interest to analysts who wish to use Common Crawl to study topics of interest in user forum data, and analysts applying topic models to other data samples. Twitter data is another example of high-velocity streaming data. We use it as an example to study the query expansion application in streaming social media data analysis. Query expansion is a problem concerned with gathering more relevant documents from a given set that cover a certain topic. Here in this thesis we outline a number of tools for a query expansion system that will allow its user to gather more relevant documents (in this case, tweets from the Twitter social media system), while discriminating from irrelevant documents. These tools include a method for triggering a given query expansion using a Jaccard similarity threshold between keywords, and a query expansion method using archived news reports to create a vector space of novel keywords. As the nature of streaming data, Twitter stream contains emerging events that are constantly changing and therefore not predictable using static queries. Since keywords used in static query method often mismatch the words used in topics around emerging events. To solve this problem, our proposed approach of automated query expansion detects the emerging events in the first place. Then we combine both local analysis and global analysis methods to generate queries for capturing the emerging topics. Experiment results show that by combining the global analysis and local analysis method, our approach can capture the semantic information in the emerging events with high efficiency

    Disruption of the Gene Encoding Endo-β-1, 4-Xylanase Affects the Growth and Virulence of Sclerotinia sclerotiorum

    Get PDF
    Sclerotinia sclerotiorum (Lib.) de Bary is a devastating fungal pathogen with worldwide distribution. S. sclerotiorum is a necrotrophic fungus that secretes many cell wall-degrading enzymes (CWDEs) that destroy plant’s cell-wall components. Functional analyses of the genes that encode CWEDs will help explain the mechanisms of growth and pathogenicity of S. sclerotiorum. Here, we isolated and characterized a gene SsXyl1 that encoded an endo-β-1, 4-xylanase in S. sclerotiorum. The SsXyl1 expression showed a slight increase during the development and germination stages of sclerotia and a dramatic increase during infection. The expression of SsXyl1 was induced by xylan. The SsXyl1 deletion strains produce aberrant sclerotia that could not germinate to form apothecia. The SsXyl1 deletion strains also lost virulence to the hosts. This study demonstrates the important roles of endo-β-1, 4-xylanase in the growth and virulence of S. sclerotiorum

    Edge-Mediated Skyrmion Chain and Its Collective Dynamics in a Confined Geometry

    Full text link
    The emergence of a topologically nontrivial vortex-like magnetic structure, the magnetic skyrmion, has launched new concepts for memory devices. There, extensive studies have theoretically demonstrated the ability to encode information bits by using a chain of skyrmions in one-dimensional nanostripes. Here, we report the first experimental observation of the skyrmion chain in FeGe nanostripes by using high resolution Lorentz transmission electron microscopy. Under an applied field normal to the nanostripes plane, we observe that the helical ground states with distorted edge spins would evolves into individual skyrmions, which assemble in the form of chain at low field and move collectively into the center of nanostripes at elevated field. Such skyrmion chain survives even as the width of nanostripe is much larger than the single skyrmion size. These discovery demonstrates new way of skyrmion formation through the edge effect, and might, in the long term, shed light on the applications.Comment: 7 pages, 3 figure

    Direct imaging of a zero-field target skyrmion and its polarity switch in a chiral magnetic nanodisk

    Full text link
    A target skyrmion is a flux-closed spin texture that has two-fold degeneracy and is promising as a binary state in next generation universal memories. Although its formation in nanopatterned chiral magnets has been predicted, its observation has remained challenging. Here, we use off-axis electron holography to record images of target skyrmions in a 160-nm-diameter nanodisk of the chiral magnet FeGe. We compare experimental measurements with numerical simulations, demonstrate switching between two stable degenerate target skyrmion ground states that have opposite polarities and rotation senses and discuss the observed switching mechanism.Comment: 18 pages, 4 figure
    • …
    corecore